Correcting the hub occurrence prediction bias in many dimensions
نویسندگان
چکیده
Data reduction is a common pre-processing step for k-nearest neighbor classification (kNN). The existing prototype selection methods implement different criteria for selecting relevant points to use in classification, which constitutes a selection bias. This study examines the nature of the instance selection bias in intrinsically high-dimensional data. In high-dimensional feature spaces, hubs are known to emerge as centers of influence in kNN classification. These points dominate most kNN sets and are often detrimental to classification performance. Our experiments reveal that different instance selection strategies bias the predictions of the behavior of hub-points in high-dimensional data in different ways. We propose to introduce an intermediate un-biasing step when training the neighbor occurrence models and we demonstrate promising improvements in various hubness-aware classification methods, on a wide selection of high-dimensional synthetic and real-world datasets.
منابع مشابه
Prediction of frost occurrence by estimating daily minimum temperature in semi-arid areas in Iran
ABSTRACT- Many fruits, vegetables and ornamental crops of tropical origin experience physiological damage when subjected to low temperatures. Protection of plants from the effects of lethally low temperatures is important in agriculture, especially in horticultural production of high value fruits and vegetables. The objective of this study was to develop a simple model to predict the daily mini...
متن کاملPrediction of Severity of Delusion Based on Jumping-to-Conclusion Bias in Schizophrenia Patients
Objectives: New cognitive theories of delusions have proposed that deficit or bias in inference stage (a stage of normal belief formation) is significant in delusion formation. The aim of this study was predicting the severity of delusions based on jumping-to-conclusion bias in patients with schizophrenia. Methods: The sample consisted of 60 deluded patients with schizophrenia w...
متن کاملHybrid Method of Logistic Regression and Data Envelopment Analysis for Event Prediction: A Case Study (Stroke Disease)
Abstract Predictive analytics is an area of statistics that deals with extracting information from data and using it to predict trends and behavior patterns. Many mathematical modeling has been developed and used for prediction, and in some cases, they have been found to be very strong and reliable. This paper studies different mathematical and statistical approaches for events prediction. The ...
متن کاملHigh Throughput Interaction Data Reveals Degree Conservation of Hub Proteins
Research in model organisms relies on unspoken assumptions about the conservation of protein-protein interactions across species, yet several analyses suggest such conservation is limited. Fortunately, for many purposes the crucial issue is not global conservation of interactions, but preferential conservation of functionally important ones. An observed bias towards essentiality in highly-conne...
متن کاملOptimizing a Fuzzy Green p-hub Centre Problem Using Opposition Biogeography Based Optimization
Hub networks have always been acriticalissue in locating health facilities. Recently, a study has been investigated by Cocking et al. (2006)in Nouna health district in Burkina Faso, Africa, with a population of approximately 275,000 people living in 290 villages served by 23 health facilities. The travel times of the population to health services become extremely high during the rainy season, s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Comput. Sci. Inf. Syst.
دوره 13 شماره
صفحات -
تاریخ انتشار 2016